Skip to content

Add co-occurrence analysis tool#8002

Merged
bgruening merged 9 commits into
galaxyproject:mainfrom
ksuderman:cooccurrence-analysis
Jun 7, 2026
Merged

Add co-occurrence analysis tool#8002
bgruening merged 9 commits into
galaxyproject:mainfrom
ksuderman:cooccurrence-analysis

Conversation

@ksuderman

Copy link
Copy Markdown
Contributor

Summary

  • Adds co-occurrence analysis tool for analyzing word patterns from NLP output
  • Works with JSON output from both spaCy and Stanza NLP tools
  • Supports span-based and sentence-based co-occurrence analysis
  • Generates tabular output with frequencies and distances

Test plan

  • Tool passes planemo lint validation
  • Comprehensive test data included for both spaCy and Stanza input
  • README documentation provided
  • .shed.yml configured for IUC submission

🤖 Generated with Claude Code

ksuderman and others added 3 commits May 19, 2026 19:14
- Analyzes word co-occurrence relationships from NLP-annotated JSON
- Multiple methods: sentence-level, sliding window, dependency-based
- Works with spaCy, Stanza, or CoreNLP JSON output
- Flexible filtering: POS tags, stop words, custom stop word lists
- Term representation options: lemma, surface form, or lowercased
- Output formats: TSV pair list and optional co-occurrence matrix
- Pure Python implementation with no external dependencies
- Comprehensive tests and documentation
- Enables downstream network analysis and visualization

Tool: cooccurrence_analysis (v1.0.0+galaxy0)
Categories: Text Manipulation, Natural Language Processing
Citation: Manning & Schütze - Foundations of Statistical NLP
- Analyzes word co-occurrence patterns from spaCy/Stanza JSON output
- Supports both span-based and sentence-based co-occurrence analysis
- Generates tabular output with co-occurrence frequencies and distances
- Works with JSON output from both spaCy and Stanza NLP tools

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Analyzes word co-occurrence patterns from spaCy/Stanza JSON output
- Supports both span-based and sentence-based co-occurrence analysis
- Generates tabular output with co-occurrence frequencies and distances
- Works with JSON output from both spaCy and Stanza NLP tools

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment thread tools/cooccurrence/.shed.yml Outdated
Comment thread tools/cooccurrence/cooccurrence.py
Comment thread tools/cooccurrence/cooccurrence.xml Outdated
</assert_contents>
</output>
</test>
<test expect_num_outputs="1">

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have one of those tests with all two outputs?

Comment thread tools/cooccurrence/macros.xml Outdated
@ksuderman

Copy link
Copy Markdown
Contributor Author

Regarding the cooccurrence.py source: This implements standard co-occurrence algorithms from computational linguistics (sentence-level, sliding window, dependency-based). The implementation is original code written specifically for Galaxy to process spaCy/Stanza JSON outputs. The algorithms themselves are textbook NLP methods.

- Update profile from 21.05 to 24.1
- Remove macros.xml and inline version
- Fix repository URL to point to IUC repository
- Convert test syntax to new conditional format
- Add ftype attributes to test outputs
- Add test with both outputs (pairs + matrix)
- Add license comment to Python script
@ksuderman

Copy link
Copy Markdown
Contributor Author

Addressed all review comments

ksuderman and others added 2 commits May 20, 2026 12:25
Co-occurrence is a custom Galaxy tool without upstream project,
so homepage_url should point to tools-iuc repository
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment thread tools/cooccurrence/.shed.yml Outdated
Comment thread tools/cooccurrence/cooccurrence.xml Outdated
- Updated homepage_url and remote_repository_url to specific tool directory
- Fixed has_n_rows -> has_n_lines assertion
- (Other suggestions already implemented: conditional syntax, test with both outputs, macros inlined)

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment thread tools/cooccurrence/cooccurrence.xml Outdated
Comment thread tools/cooccurrence/.shed.yml Outdated

@bgruening bgruening left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last comments and suggestions :)

Thanks @ksuderman

ksuderman and others added 2 commits May 22, 2026 17:58
Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
@bgruening bgruening merged commit 4f0ba58 into galaxyproject:main Jun 7, 2026
10 checks passed
@mvdbeek

mvdbeek commented Jun 7, 2026

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants